Model Selection

DPO Optimization

# DPO Optimization

Zhi Writing Dsr1 14b

A creative writing enhancement model fine-tuned and optimized based on DeepSeek-R1-Distill-Qwen-14B, showing significant improvements in creative writing

Large Language Model

Transformers Supports Multiple Languages

This is a speech language model based on discrete Hubert tokens, focusing on efficient training and capable of generating speech segment continuations.

Audio Generation

Summllama3.1 8B

SummLlama3.1-8B is a text summarization model initialized from Llama3.1-8B-Instruct, optimized through large-scale summarization feedback via Direct Preference Optimization (DPO), excelling in fidelity, completeness, and conciseness.

Text Generation

UNA ThePitbull 21.4B V2

UNA-ThePitbull-21.4B-v2 is a large language model based on 21.4B parameters, with performance close to 70B models, integrating emotional intelligence and IQ, excelling in dialogue and text generation.

Large Language Model

Llama3 OpenBioLLM 70B

OpenBioLLM-70B is an advanced open-source language model specifically designed for the biomedical field, fine-tuned based on Meta-Llama-3-70B-Instruct, demonstrating outstanding performance in biomedical tasks.

Large Language Model

Transformers Supports Multiple Languages

Sambalingo Hungarian Chat

A human-preference-aligned chat model supporting Hungarian and English, adapted from Llama-2-7b for Hungarian

Large Language Model

Transformers Supports Multiple Languages

sambanovasystems

Llava V1.5 13b Dpo Gguf

LLaVA-v1.5-13B-DPO is a vision-language model based on the LLaVA framework, trained with Direct Preference Optimization (DPO) and converted to GGUF quantized format to improve inference efficiency.

Bloom 1b1 Zh Error Correction Dpo

A Chinese text error correction model trained with DPO, capable of automatically detecting and correcting spelling and grammar errors in Chinese text.

Large Language Model

Transformers Chinese

UNA TheBeagle 7b V1

TheBeagle is a 7-billion-parameter model trained on The Bagel dataset, optimized with DPO (Direct Preference Optimization) and UNA (Unified Neural Architecture) techniques, demonstrating excellent performance in multi-task scenarios.

Large Language Model

Rocket-3B is a 3-billion-parameter large language model trained on public datasets through Direct Preference Optimization (DPO), outperforming many larger-scale models.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase